Saving and restoring non-contiguous blocks of preserved registers

ABSTRACT

Described herein are instruction set architectures (ISAs), and related data processing apparatuses and methods, with two or more non-contiguous blocks of preserved registers wherein the registers to be saved or restored are identified in a save or restore instruction via a number of registers to be saved/restored (Num_Reg) and a starting register (rStart). Specifically, in the ISAs, apparatuses, and methods described herein, the registers to be saved or restored are identified as the Num_Reg registers in a predetermined sequence starting with rStart wherein, in the predetermined sequence, each register is followed by the next highest numbered register except the highest numbered preserved register, which is followed by the lowest numbered preserved register.

RELATED APPLICATIONS

This application claims the benefit of U.S. provisional patent applications “Implicit Global Pointer Relative Addressing for Global Memory Access” Ser. No. 62/552,855, filed Aug. 31, 2017, “Unified Logic” Ser. No. 62/552,796, filed Aug. 31, 2017, “Pointer-Size Controlled Instruction Processing” application Ser. No. 62/552,841, filed Aug. 31, 2017, “Saving and Restoring Non-Contiguous Blocks of Preserved Registers” Ser. No. 62/552,830, filed Aug. 31, 2017, and “Unaligned Memory Accesses” Ser. No. 62/558,930, filed Sep. 15, 2017. Each of the foregoing applications is hereby incorporated by reference in its entirety.

FIELD OF ART

This application relates generally to saving and restoring registers and more particularly to saving and restoring non-contiguous blocks of preserved registers.

BACKGROUND

People regularly interact with a wide variety of electronic systems. Common electronic systems include computers, smartphones, and tablet computers, while other electronic systems now appear in many familiar items, ranging from household appliances to vehicles. These electronic systems include integrated circuits or “chips” which, depending on the system in which the chips are used, can range from simple to highly complex. The chips are designed to perform a wide variety of system functions, and to enable the systems to perform their functions effectively and efficiently. The chips are built using highly complex circuit designs, architectures, and system implementations. The chips are, quite simply, integral to the electronic systems. The chips are designed to implement system functions such as user interfaces, communications, processing, and networking. These system functions are applied to electronic systems used for business, entertainment, or consumer electronics purposes. The electronic systems routinely contain more than one chip. The chips implement critical system functions including computation, storage, and control. The chips support the electronic systems by computing algorithms and heuristics, handling and processing data, communicating internally and externally to the electronic system, and so on. Since the chips are required to perform a large number of computations and functions, any improvements in chip efficiency can contribute to a significant and substantial impact on overall system performance. As the amount of data to be handled increases, the approaches that are used must not only be effective, efficient, and economical, but must also scale as the amount of data increases.

Single processor architectures based on chips are well suited for some computational tasks, but are unable to achieve the high performance levels which are required by some high-performance systems. Multiple single processors can be used together to boost performance. Parallel processing based on general-purpose processors can attain an increased level of performance, thus parallelism is one approach for achieving increased performance. There is a wide variety of applications that demand high performance levels. Common applications requiring high performance include networking, image and signal processing, and large simulations, to name but a few. In addition to computing power, chip and system flexibility are important for adapting to ever-changing computational needs and technical situations.

System or chip reconfigurability is another approach that can address application demands. The system or chip attribute of reconfigurability is critical to many processing applications, as reconfigurable devices are extremely efficient for specific processing tasks. In certain circumstances, the cost and performance advantages of reconfigurable devices exist because the reconfigurable or adaptable logic enables program parallelism, which allows multiple computation operations to occur simultaneously. By comparison, conventional processors are often limited by instruction bandwidth and execution rate restrictions. Note that the high-density properties of reconfigurable devices can come at the expense of the high-diversity property that is inherent in other electronic systems, including microprocessors. Microprocessors have evolved to highly-optimized configurations that provide cost/performance advantages over reconfigurable systems for tasks that require high functional diversity. However, there are many tasks for which a conventional microprocessor is not the best design choice. A system architecture that supports configurable, interconnected processing elements can be an excellent alternative for many data-intensive applications such as Big Data.

SUMMARY

Described herein are instruction set architectures, and related data processing apparatuses and methods, with two or more non-contiguous blocks of preserved registers wherein the registers to be saved or restored are identified in a save or restore instruction via a number of registers to be saved/restored (Num_Reg) and a starting register (rStart). Specifically, in instruction set architectures, apparatuses, and methods described herein, the registers to be saved or restored are identified as the Num_Reg registers in a predetermined sequence starting with rStart wherein, in the predetermined sequence, each register is followed by the next highest numbered register except the highest numbered preserved register, which is followed by the lowest numbered preserved register.

A processor-implemented method of saving registers to a stack is disclosed comprising: providing access to a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; receiving an instruction for execution by an execution unit; determining whether the received instruction is a save instruction, the save instruction identifying a number of registers and a starting register within the plurality of numbered registers; and outputting one or more control signals to cause the execution unit to store to a stack the identified number of registers in a predetermined order beginning with the starting register when the determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register. In some embodiments, the outputting one or more control signals to cause the execution unit to store to the stack the identified number of registers in a predetermined order beginning with the starting register comprises outputting, for each iteration of a number of iterations, one or more control signals to cause the execution unit to store one register at an address of the stack, wherein the number of iterations equal to the identified number of registers.

A processor-implemented method of restoring registers from a stack is disclosed comprising: providing access to a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; receiving an instruction for execution by an execution unit; determining whether the received instruction is a restore instruction, the restore instruction identifying a number of registers and a starting register within the plurality of numbered registers; and outputting one or more control signals to cause the execution unit to restore from a stack the identified number of registers in a predetermined order beginning with the starting register when the determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register. In some embodiments, the outputting one or more control signals to cause the execution unit to restore from the stack the identified number of registers in a predetermined order beginning with the starting register comprises outputting, for each iteration of a number of iterations, one or more control signals to cause the execution unit to store one register at an address of the stack, wherein the number of iterations is equal to the identified number of registers.

Embodiments include a data processing apparatus comprising: a register set comprising a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; an execution unit; and a decode unit configured to: receive an instruction for execution by the execution unit; determine whether the received instruction is a save instruction, the save instruction identifying a number of registers and a starting register within the plurality of numbered registers; and output one or more control signals to cause the execution unit to store to a stack the number of registers in a predetermined order beginning with the starting register when the determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register.

Embodiments further include a data processing apparatus comprising: a register set comprising a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; an execution unit; and a decode unit configured to: receive an instruction for execution by the execution unit; determine whether the received instruction is a restore instruction, the restore instruction identifying a number of registers and a starting register within the plurality of numbered registers; and output one or more control signals to cause the execution unit to restore from a stack the number of registers in a predetermined order beginning with the starting register when the determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register.

Data processing apparatuses may be embodied in hardware on an integrated circuit. There may be provided a method of manufacturing, at an integrated circuit manufacturing system, a data processing apparatus. There may be provided an integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, configures the system to manufacture the data processing apparatus. There may be provided a non-transitory computer readable storage medium having stored thereon a computer readable description of a data processing apparatus that, when processed in an integrated circuit manufacturing system, causes the integrated circuit manufacturing system to manufacture an integrated circuit embodying the data processing apparatus. The data processing apparatus may be implemented as part of and/or referred to as a processor, a processor chip, a processor module, and so on.

Various features, aspects, and advantages of the various embodiments will become more apparent from the following further description.

BRIEF DESCRIPTION OF THE DRAWINGS

The following detailed description of certain embodiments may be understood by reference to the following figures wherein:

FIG. 1 is a schematic diagram of a main program and a subroutine called thereby.

FIG. 2 is a schematic diagram of an example call stack.

FIG. 3 is a block diagram of an example data processing apparatus for saving and/or restoring non-contiguous blocks of preserved registers.

FIG. 4 is a schematic diagram illustrating an example of the predetermined sequence.

FIG. 5 is a flow diagram of an example method of storing and/or restoring non-contiguous blocks of preserved registers.

FIG. 6 is a block diagram of an example conversion unit.

FIG. 7 is a flow diagram of an example method of converting a save instruction or a restore instruction into a plurality of micro instructions.

FIG. 8 is a schematic diagram illustrating a first example conversion of a save instruction into a plurality of store instructions.

FIG. 9 is a schematic diagram illustrating a second example conversion of a save instruction into a plurality of store instructions.

FIG. 10 is a schematic diagram illustrating a first example conversion of a restore instruction into a plurality of load instructions.

FIG. 11 is a schematic diagram illustrating a third example conversion of a save instruction into a plurality of store instructions.

FIG. 12 is a schematic diagram illustrating example formats of save instructions.

FIG. 13 is a schematic diagram illustrating example formats of restore instructions.

FIG. 14 is a block diagram of an example integrated circuit manufacturing system for generating an integrated circuit embodying a data processing apparatus as described herein.

FIG. 15 is a flow diagram for saving registers to a stack.

FIG. 16 is a flow diagram for restoring registers from a stack.

FIG. 17 is a system diagram for saving registers to a stack.

DETAILED DESCRIPTION

The following description is presented by way of example to enable a person skilled in the art to make and use the invention. The present invention is not limited to the embodiments described herein and various modifications to the disclosed embodiments will be apparent to those skilled in the art. Embodiments are described by way of example only.

In order to make computer programs more manageable, they are often broken up into smaller pieces called subroutines that are designed to perform one or more tasks. A main program is then configured to call or invoke the appropriate subroutine as needed to perform a particular task. A called subroutine may itself call other subroutines. For example, as shown in FIG. 1, when a main program 102 wants to perform task A it may call subroutine A 104, which may itself call subroutine B 106. The program or subroutine that invokes a subroutine is referred to as the caller and the subroutine that is invoked is referred to as the callee.

The code representing a main program is typically stored in a different part of memory from the code representing its subroutines, so calling or invoking a subroutine typically comprises first jumping to a different section of memory where the code for the subroutine is located and then executing the code at that location. When the subroutine has completed its task, the computer returns to the main program by jumping back to the address of the next caller instruction to execute (which is referred to herein as the return address).

Since a computer typically has one set of registers 108, the main program 102 and the subroutines 104, 106 all have access to the same set of registers 108 for storing and transferring data. This provides an easy way for the caller and callee to exchange data. For example, it provides an easy way for the caller to provide arguments or parameters to the callee and for the callee to return results to the caller.

However, the shared set of registers can also cause some problems if a subroutine modifies a register value that the caller was not expecting to be modified. For example, a main program may have placed a value of 4 in register 5, it may then call a subroutine, and after the subroutine has completed, the main program may read register 5 expecting to see a value of 4. However, the subroutine may place a value of 2 in register 5 such that, after the subroutine has completed, the main program may read the value of 2 from register 5 (instead of the expected value of 4), which may cause the main program to behave in an unexpected manner. Accordingly, to ensure that the caller knows which registers will be maintained or preserved throughout a called subroutine, in most computer architectures the registers are divided into those that will be preserved throughout a subroutine and those that will not be preserved. Then, if a subroutine wants to use any of the preserved registers, it must save the value of these registers at the beginning of the subroutine and restore the value of these registers at the end of the subroutine.

For example, some systems traditionally use 32 registers numbered 0 to 31, which are defined as per Table 1.

TABLE 1 Register Name Purpose r0 $zero zero r1 $at r2 $v0 r3 $v1 Function return variables r4 $a0 Function arguments r5 $a1 r6 $a2 r7 $a3 r8 $t0 Temp registers (caller save) r9 $t1 r10 $t2 r11 $t3 r12 $t4 r13 $t5 r14 $t6 r15 $t7 r16 $s0 Saved (callee save) r17 $s1 r18 $s2 r19 $s3 r20 $s4 r21 $s5 r22 $s6 r23 $s7 r24 $t9 Temp registers (caller save) r25 $t10 r26 $k0 reserved for kernel r27 $k1 r28 $gp global pointer r29 $sp stack pointer r30 $fp frame pointer r31 $ra return address

In this example, registers 16 to 23 are marked as saved registers and must be saved and restored by the callee if they are to be modified by the callee, and registers 28, 30, and 31 (the global pointer (GP) register, the frame pointer register, and return address register, respectively) also must be saved and restored by the callee if they are to be modified. If the caller wants any of the remaining registers to be preserved throughout the subroutine, then it is the responsibility of the caller to save these registers. Generally, the preserved registers are stored in a stack.

Specifically, in many computer architectures a call stack is used to implement subroutine calls and returns. As is known to a person of skill in the art, a stack is a data structure used to store a collection of objects. Individual items can be added to the stack using a push operation. Objects can be retrieved from the stack using a pop operation. It is possible to implement stacks that grow “up” (i.e., the newest data is added at a higher address) or grow “down” (i.e., the newest data is added at a lower address). For the purposes of illustration, the stacks described herein will be described as growing “down”, however, it will be evident to a person of skill in the art that the methods and techniques described herein may be equally applied to stacks that grow “up”.

Generally, as shown in FIG. 2, each subroutine creates a stack frame 202 (which may be referred to as the current stack frame) when it is called, at a location on the stack 200 just below the caller's stack frame 203 (which may be referred to as the previous stack frame). A stack frame 202 may be divided into three sections: a saved register section 204 used to store the values of the preserved registers (e.g., registers $s0-$s7, $gp, $fp, $ra), a local data section 206 used for local variable storage, and an outgoing argument section 208 that contains space to store arguments that are passed to any subroutines that are called by the current subroutine. Not all stack frames will have all three sections. For example, the stack frame for a subroutine that does not itself call any subroutines (e.g., a leaf subroutine) may not have an outgoing argument section 208. The stack pointer ($sp) points to the base of the current stack frame and in some cases, there may be a frame pointer ($fp) that points to the top of the current stack frame.

When a subroutine is called, the stack pointer will point to the current bottom of the stack (i.e., the bottom of the previous stack frame). A save operation is then performed to store to the stack any preserved registers that will be modified by the subroutine. The save operation involves identifying the registers to be saved and the address in the stack where they should be stored (e.g., as offsets from the current stack pointer). The stack pointer is then adjusted to point to the bottom of the stack (i.e., the bottom of the current stack frame). For example, the stack pointer is adjusted by the size of the current stack frame (i.e., the number of bytes to store the preserved registers+ the number of bytes reserved for local variables+ the number of bytes reserved for outgoing arguments). At the end of a subroutine, a restore operation is performed to restore the saved preserved registers. The restore operation involves identifying the registers to be restored and the address in the stack where they were stored (e.g., as offsets from the current stack pointer). The stack pointer is then adjusted to point to the bottom of the previous stack frame. Since subroutine calls are very common, it is desirable that the saving and restoring of registers is implemented in an efficient manner.

The embodiments described below are provided by way of example only and are not limiting of implementations that solve any or all of the disadvantages of known instruction set architectures, methods, and systems for saving and restoring registers.

Some instruction set architectures (ISAs) include save and restore instructions, which, when executed by a data processing apparatus implementing the ISA, cause the data processing apparatus to save or restore the appropriate preserved registers. Specifically, when a save or restore instruction is decoded by a data processing apparatus, it is converted into a series of simpler instructions (e.g., a series of store instructions in the case of a save instruction, and a series of load instructions in the case of a restore instruction) that cause the data processing apparatus to save or restore the appropriate registers. Instructions, such as save and restore instructions, that are automatically expanded into a set of simpler instructions (e.g., loads and stores) are referred to as macro instructions, and the simpler instructions (e.g., loads and stores) that they are converted into are referred to as micro instructions.

Since macro instructions, such as save and restore instructions, are single instructions that represent a plurality of micro instructions, they effectively improve the code density of programs. As is known to those of skill in the art, the term “code density” describes the amount of space that the executable code for a program takes up in memory. This may also be referred to as the “memory footprint” of the program. The denser the code, the less space the code takes up in memory, and conversely, the less dense the code, the more space the code takes up in memory. The code density of a program is a function of the ISA implemented by the data processing apparatus that executes the program and is typically based on both the number of instructions to perform each action of the program and the number of bits per instruction. Generally, the fewer instructions required to perform actions and the fewer bits per instruction, the denser the code. Accordingly, macro instructions typically improve code density, because they allow one executable instruction to replace multiple executable instructions, thus reducing the number of executable instructions used to represent a program. Code density is particularly important when the program is to be executed by a data processing apparatus with a limited amount of memory, such as a mobile telephone or other embedded systems.

As described above, the plurality of registers of a data processing apparatus are divided into preserved registers and non-preserved registers. When a subroutine is called, it is expected that the subroutine will save any preserved registers it is going to use at the beginning of the subroutine and restore them at the end of the subroutine. To achieve this, the subroutine will cause a save instruction to be executed on entry and a restore instruction to be executed on exit. When the decode unit of a data processing apparatus receives a save instruction or a restore instruction, the decode unit identifies, from one or more arguments of the save instruction or restore instruction, the list of preserved registers to be saved or restored and the addresses of the stack in which the identified registers are to be saved to or restored from, then converts the save or restore instruction into a plurality of store or load instructions that cause the identified registers to be saved to or restored from the identified addresses of the stack.

It has proved difficult in some architectures, such as where the preserved registers do not form a continuous block of registers (e.g., registers 16-23, 28, 30 and 31), to identify the registers to be saved/restored in an efficient manner. For example, in some ISAs, a table of different fixed combinations of the preserved registers is maintained and the save or restore instruction identifies the registers to be saved or restored via an index to the table. Such a means of identifying the registers to be saved or restored is inflexible and cannot be implemented efficiently in hardware.

One way to address this problem would be to reallocate the registers in the register set so that the preserved registers form a continuous block of registers (e.g., a block of consecutively numbered registers). This would mean that some registers would be used for different purposes than in previous versions of the ISA. For example, where in a previous allocation register 24 was a temporary register that was not saved, in a new allocation register 24 may be a preserved register. This inconsistency between register conventions in the new ISA and the old ISA can create a number of problems. First, the magnitude of effort required to port programs, particularly programs written using assembly language, from the old ISA to the new ISA may be significantly increased by the need to reallocate registers to match the new convention. Second, reallocating registers with a specific hardware meaning, such as the return address register, may increase the difficulty of creating a data processing apparatus (e.g., processor) capable of running both the new ISA and the old ISA, since the return address would need to be stored in a different register depending on the ISA mode in which the data processing apparatus is operating.

Accordingly, described herein are ISAs, and related data processing apparatuses and methods, with two or more non-contiguous blocks of preserved registers wherein the registers to be saved or restored are identified in a save or restore instruction via a number of registers to be saved/restored (Num_Reg) and a starting register (rStart). Specifically, in the ISAs, apparatuses, and methods described herein, the registers to be saved or restored are identified as the Num_Reg registers in a predetermined sequence starting with rStart wherein in the predetermined sequence each register is followed by the next highest numbered register except the highest numbered preserved register, which is followed by the lowest numbered preserved register. For example, where the preserved registers are 16 23, 28 and 30, 31 in the examples described with reference to Table 1, Num_Reg is 2, and rStart is 20, the registers to be saved or restored are registers 20 and 21; and where Num_Reg is 3 and rStart is 31, the registers to be saved or restored are registers 31, 16, and 17, since 31 is the highest numbered preserved register and 16 is the lowest numbered preserved register. Allowing the registers to be identified as a sequence makes it simple for the hardware to decode and implement such save and restore instructions. Furthermore, since the register allocation of known ISAs is maintained, an implementation of such an ISA can be backward compatible.

As is known in the art, different ISAs can employ different nomenclature or mnemonics for describing the concepts of storing a register and restoring a register. For example, load and store can be used to describe loading data into a register and storing data from a register, respectively. Additionally, storing and restoring can involve one data word (of whatever data size is employed) or a sequence of data words. Examples can include load multiple word (LMW) and store multiple word (SMW), among others. Furthermore, different ISAs can utilize an explicit stack for some instructions and an implicit stack (not explicitly defined as a stack) for other instructions. For example, an LMW and/or SMW instruction may not use an explicit stack, but rather may use an encoded list of registers for loading from or storing to memory as defined in the instruction itself In this type of embodiment, the stack is understood to be a virtual, or implicit, stack. Previous techniques included using an explicit encoding scheme that was more costly in terms of area and power. The counter-based scheme recognizes the that the operations are done serially so there is no advantage to knowing which registers are to be accessed. The address generation for save/restore is accomplished differently than LWM/SWM, because the address generation references the stack.

Reference is now made to FIG. 3, which illustrates an example data processing apparatus 300 that implements an ISA like those described herein, which comprises an instruction set that includes one or more save instructions and/or one or more restore instructions identifying the registers to be stored or restored via the number of registers to be saved/restored (Num_Reg) and a starting register (rStart). A data processing apparatus is any device, machine, or dedicated circuit, such as, but not limited to, a processor, computer, or computer system, with processing capability such that it can execute instructions. A processor may be any kind of general-purpose or dedicated processor, such as a CPU, GPU, System-on-chip, state machine, media processor, an application-specific integrated circuit (ASIC), a programmable logic array, a field-programmable gate array (FPGA), or the like. A computer or computer system may comprise one or more processors.

The data processing apparatus 300 of FIG. 3 comprises a register file 302, a decode unit 304, and an execution unit 306. It will be evident to a person of skill in the art that the data processing apparatus 300 of FIG. 3 may comprise other components that are not shown such as, but not limited to, a fetch unit and input/output interface(s).

The register file 302 comprises a plurality of numbered registers, which can be written to, and read from, by the execution unit 306. In some examples, there may be 32 registers numbered 0 to 31, however, it will be evident to a person of skill in the art that this is an example only and that the register file 302 may comprise any number of registers.

The plurality of numbered registers comprises a plurality of preserved registers 308 and one or more non-preserved registers 310. The term “preserved register” is used herein to mean a register that a caller of a subroutine expects to be preserved throughout the subroutine. Accordingly, it is expected that a callee subroutine will save any preserved registers that it expects to use during the subroutine (including during any subroutine(s) it calls) at the beginning of the subroutine by executing a save operation and restoring those registers at the end of the subroutine by executing a restore operation.

The preserved registers 308 may include registers (e.g., registers 16-23 of Table 1) identified as saved registers, a return address register (e.g., register 31 of Table 1) and/or a frame pointer register (e.g., register 30 of Table 1). Saved registers, which may also be referred to as callee save registers, are registers that can be used by a program or subroutine to store and transfer data and are expected to be preserved throughout a subroutine call. The return address register is used to store the address to return to after the end of a subroutine. The return address is typically set by the caller of a subroutine to the address of the instruction after the calling instruction. For example, if a calling instruction (e.g., a jump subroutine instruction) is at address Ox1000, then when this instruction is executed the return address register may be set to Ox1004 where each instruction is four bytes long. The frame pointer register is used, in some cases, to point to the top of the current stack frame. Provided the frame layout meets certain conditions, the frame pointer register can be particularly useful for debugging. This is because the frame pointer register identifies the top of the current stack frame. The caller's frame pointer and return address can then be read from the current stack frame provided they are stored at fixed offsets from the top of the stack frame. Once the caller's frame pointer value is known, the caller's frame pointer and return address can be read from the caller's stack frame and so on, allowing the full call history of the current subroutine to be determined.

The plurality of preserved registers 308 does not form a continuous block. In other words, the preserved registers are not continuously numbered. In particular, the plurality of preserved registers 308 comprise at least a first block 312 of continuously numbered registers and a second block 314 of continuously numbered registers wherein the first and second blocks, 312 and 314, are not contiguous with each other. In other words, there is a gap in the numbering between the registers in the first block 312 and the registers in the second block 314. For example, in the example register list shown in Table 1, the first block 312 of preserved registers may comprise the save registers/callee save registers (e.g., registers 16-23) and the second block 314 of preserved registers may comprise the frame pointer register and the return address register (e.g., registers 30 and 31). The term block is used herein to refer to one or more registers. Accordingly, a block may comprise a single register or multiple registers. A block of registers may alternatively be referred to as a set of registers.

The decode unit 304 is configured to receive computer executable instructions representing a program or subroutine that are based on an instruction set that comprises one or more save instructions and/or one or more restore instructions that identify the registers to be saved/restored via a number of registers to be saved/restored (Num_Reg) and a starting register (rStart). The computer executable instructions may be provided to the decode unit 304 by a fetch unit (not shown) configured to fetch instructions of a program or subroutine (in program/subroutine order) in memory 316 as indicated by a program counter (PC). Each instruction identifies one or more operations or tasks (e.g., load, store, add, subtract, jump, branch, store, restore) to be performed and none, one or more than one argument for the operation(s) (e.g., the operands). The decode unit 304 is configured to decode each received instruction to identify the operation(s) to be performed and output control signals to cause the execution unit 306 to perform the identified operation(s) according to the identified argument(s) (e.g., operands). Outputting the control signals may be referred to herein as providing the decoded instructions to the execution unit 306 for execution.

In particular, the decode unit 304 is configured to determine whether a received instruction is a save instruction or a restore instruction and, in response to detecting that a received instruction is a save instruction or a restore instruction, convert the save instruction or restore instruction into one or more sub-operations (e.g., loads or stores) based on the identified arguments and then output one or more control signals to cause the execution unit to perform the one or more sub-operations so as to save or restore none, one, or more than one register in the register file 302 to/from the stack 322.

In this example, the decode unit 304 comprises a detection unit 318 and a conversion unit 320. The detection unit 318 is configured to analyze received instructions to determine whether a received instruction is a save instruction or a restore instruction. In some cases, the detection unit 318 may be configured to identify a save instruction or a restore instruction by a unique bit pattern in the instruction. Example formats for save instructions and restore instructions that may be received by the detection unit 318 are described below with reference to FIGS. 12 and 13. If the detection unit 318 determines that the received instruction is a save instruction or a restore instruction, then the instruction is forwarded to the conversion unit 320.

The conversion unit 320 is configured to convert a received save instruction or restore instruction into none, one, or more than one load and/or store so as to save or restore none, one or more than one register in the register file 302 to/from the stack 322, and output one or more control signals to cause the execution unit to execute the load(s) or store(s). The conversion unit 320 of FIG. 3 is configured to identify the registers to be saved/restored from the number of registers to be saved/restored (Num_Reg) and the starting register (rStart) specified in the save instruction or the restore instruction. For example, as described below in reference to FIGS. 12 and 13, the save and restore instructions may have a field (e.g., a dedicated set of bits) that specifies the rStart argument and a field (e.g., a dedicated set of bits) that specifies the Num_Reg argument. The conversion unit 320 is configured to identify the registers to be saved or restored as the Num_Reg registers in a predetermined sequence beginning with rStart, wherein in the predetermined sequence each register is followed by the next highest numbered register except the highest numbered preserved register, which is followed by the lowest numbered preserved register.

Examples of how the registers to be saved or restored may be identified by Num_Reg and rStart will be described with reference to FIG. 4. In these examples, the preserved registers comprise registers 16-23, 30, and 31, which can be divided into a first block of registers (registers 16-23) and a second block of registers (registers 30-31), wherein the first and second blocks are non-contiguous. In a first example 402, rStart is 31 and Num_Reg is 3, meaning that three registers in a predetermined sequence 404 beginning with register 31 are to be saved or restored. In the predetermined sequence 404, indicated by the arrows, each register (15-22, 30) is followed by the next highest numbered register (16-23, 31) except the highest numbered preserved register (31), which is followed by the lowest numbered preserved register (16). In other words, in the predetermined sequence, the registers are in increasing numerical order except the highest preserved register (31), which loops back to the lowest numbered preserved register (16). Therefore, according to the predetermined sequence 404, the two registers after 31 are 16 and 17; thus, the three registers that are saved or restored when rStart is 31 are registers 31, 16 and 17.

In a second example 406, rStart is 30 and Num_Reg is 10 meaning that ten registers in a predetermined sequence 404 beginning with register 30 are to be saved or restored. According to the predetermined sequence 404, the ten registers that are stored when rStart is 30 are registers 30, 31, 16, 17, 18, 19, 20, 21, 22, and 23 (i.e., all of the preserved registers).

The starting register (rStart) may be a preserved register (e.g., one of registers 16-23, 30-31) or may be a non-preserved register (e.g., one of registers 1-15, 24-29). For example, in a third example 408, rStart is 15 (a non-preserved register) and Num_Reg is 3, meaning that three registers in a predetermined sequence starting with register 15 are stored. According to the predetermined sequence 404, the three registers that are stored when rStart is 15 are registers 15, 16, and 17. Even when the starting register (rStart) is a preserved register (e.g., one of registers 16-23, 30-31), one or more of the registers in the predetermined sequence may be a non-preserved register (e.g., one of registers 1-15, 24-29). For example, if rStart is 23 (a preserved register) and Num_Reg is 3, then three registers in a predetermined sequence starting with register 23 are stored. According to the predetermined sequence 404, the three registers that are stored when rStart is 23 are registers 23 (a preserved register), 24 (a non-preserved register), and 25 (a non-preserved register). Allowing the starting register (rStart) or one or more other registers in the sequence to be non-preserved registers enables the subroutine to save and restore different registers. This allows for the possibility of changing the set of registers that are preserved. For example, it may be advantageous in some cases to have additional preserved registers (e.g. registers 24 and 25 can be preserved registers).

In some cases, the subroutine may have the option of storing and/or restoring an out-of-sequence register. An out-of-sequence register is any register that does not fall in the predetermined sequence identified by the number of registers (Num_Reg) and the starting register (rStart). In these cases, a save instruction and/or a restore instruction may include information that explicitly indicates whether a particular out-of-sequence register is to be saved or restored. For example, a save instruction and/or a restore instruction may include a bit for a particular out-of-sequence register, which when set indicates that the particular out-of-sequence register is available to be saved or restored. When the save instruction indicates that the particular out-of-sequence register is ready to be saved or restored, that register replaces the last register in the predetermined sequence. Replacing the last register in the predetermined sequence with the particular out-of-sequence register allows an additional register to be easily saved/restored taking into account the fact that, in some cases, when certain preserved registers are saved or restored, they are saved or restored first to ensure that they have a predetermined location in the stack. Specifically, in some cases, when the return address register and/or the frame pointer are to be saved or restored, it is desirable to save/restore them first in the stack so that they are at a known offset from the top of the current stack frame. Accordingly, saving/restoring the out-of-sequence register last provides a simple way of saving or restoring an additional register, as saving the out-of-sequence register first is not a viable option.

In some examples the GP register (register 28 in the example described above with respect to Table 1) is a preserved register, and, in these examples, the out-of-sequence register may be the GP register. The GP register is a register that may be configured to point to an address in memory where data is stored to aid in accessing global data. Specifically, as described in more detail in reference to FIGS. 12 and 13, in some examples, a save instruction and/or a restore instruction may include information (e.g., a bit that may be referred to as the GP bit or argument) that explicitly indicates whether the GP register is to be saved/restored. In these cases, when the save instruction or restore instruction indicates that the GP register is ready to be saved or restored, the GP register replaces the last register in the predetermined sequence. For example, in a fourth example 410, rStart is 31, Num_Reg is 9, and the GP bit is set, meaning that nine registers are to be saved/restored: eight registers in the predetermined sequence beginning with register 31 and the GP register (28). According to the predetermined sequence 404 the nine registers that are stored are 31, 16, 17, 18, 19, 20, 21, 22, and 28 (which replaced register 23 as the last register). It will be evident to the person of skill in the art that the GP register is an example of an out-of-sequence register and any out-of-sequence register may be saved/restored in the same manner as the GP register. Thus embodiments can include the out-of-sequence register being a global pointer register.

Returning back to FIG. 3, in addition to identifying the registers to be saved or restored, the conversion unit 320 is configured to output one or more control signals to cause the execution unit to store the identified registers in the stack or restore the identified registers from data in the stack, in the predetermined sequence. In some cases, this may comprise outputting control signals to cause the execution unit to perform a load, or store, to/from the stack for each identified register. The control signals may take any suitable format. It will be evident to a person of skill in the art that the format of the one or more control signals generated by the conversion unit 320 and received by the execution unit 306 may vary between data processing apparatuses. For example, in some cases, the one or more control signals may comprise information explicitly identifying the specific address of the stack for each load or store, whereas, in other cases, the one or more control signals may comprise information identifying a single base address and a unique sub-operation number (e.g., iteration number) for each load/store, and the execution unit 306 may be configured to identify the address for each load/store from the base address and the sub-operation number (e.g., iteration number).

Processor architectures have been routinely categorized by describing either the underlying hardware architecture or microarchitecture of a given processor, or by referencing the instruction set executed by the processor. The latter, the ISA, describes the types and ranges of instructions available, rather than how the instructions are implemented in hardware. By referencing an instruction set, a given ISA can be implemented using a wide range of techniques, where the techniques can be chosen based on preference or need for execution speed, data throughput, power dissipation, and manufacturing cost, among many other criteria. The ISA serves as an interface between code that is to be executed on the processor and the hardware that implements the processor. ISAs, and the processors or computers based on them, are partitioned broadly into categories including complex instruction set computers (CISC) and reduced instruction set computers (RISC). The ISAs define types of data that can be processed; the state or states of the processor, where the state or states include the main memory and a variety of registers; and the semantics of the ISA. The semantics of the ISA typically include modes of memory addressing and memory consistency. In addition, the ISA defines the instruction set for the processor, whether there are many instructions (complex) or fewer instructions (reduced), and the model for control signals and data that are input and output. RISC architectures have many advantages over processor design because by reducing the numbers and variations of instructions, the hardware that implements the instructions can be simplified. Further, compilers, assemblers, linkers, etc., that convert the code to instructions executable by the architecture can be simplified and tuned for performance.

In order for a processor to process data, the data must be made available to the processor or process. As discussed throughout, pointers can be used to share data between and among processors, processes, etc., by providing a reference address or pointer to the data. The pointer can be provided rather than transferring the data to each processor or process that requires the data. The pointers that are used for passing data references can be local pointers known only to a given, local processor or process, or can be GPs. The GPs can be shared among multiple processors or processes. The GPs can be organized or grouped into a GP register. The registers can include general-purpose registers, floating point registers, and so on. While operating systems such as Linux™ can use a GP for position independent code (PIC), the use of the GP implies that a particular register explicitly is used to support PIC handling and execution. In contrast, the presently described RISC architecture uses instructions that implicitly reference a GP source. The GP source provides operands manipulated by the instructions. Use of instructions that implicitly use GP source operands allows bits within the instructions to be used for purposes other than explicitly referencing GP registers. The result of implicit GP source operands is that the instructions can free the bits previously used to declare the GP, and can therefore provide longer address offsets, extended register ranges, and so on.

A further capability of the presently described architecture includes support of the rotate and exchange or ROTX instruction. This instruction can support a variety of data operations such as bit reversal, bit swap, byte reversal, byte swap, shifting, striping, and so on, all within one instruction. The use of the ROTX instruction provides a computationally inexpensive technique for implementing multiple instructions within one instruction. The rotate and exchange instruction can overlay a barrel shifter or other shifter commonly available in the presently described architecture. Separately implementing these various rotate, exchange, or shift instructions would increase central processing unit (CPU) complexity because each instruction would have an impact on one or more aspects of the CPU design. By merging the various instructions into the ROTX instruction, CPU hardware that implemented the separate instructions can be combined to result in a less complex processor.

Processors commonly include a “mode” designator to indicate that the mode in which a processor is operating is based on a number of bytes, words, and so on. For some processor architecture techniques, a mode can include a 16-bit operation, a 32-bit operation, a 64-bit operation, and so on. One or more bits within an instruction can be used to indicate the mode in which a particular instruction is to be executed. In contrast, if the processor is designed to operate without mode bits within each instruction, then the mode bits within each instruction can be repurposed. The repurposed bits within the instruction can be used to implement the longer address offsets or extended register ranges described elsewhere. When an operation “mode” is still needed for a particular operation, then instructions that are code-density oriented can be added. Specific instructions can be implemented for 16-bit, 32-bit, 64-bit, etc., operations when needed, rather than implementing every instruction to include bits to define a mode, whether the mode is relevant to the instruction or not.

Storage used by processors can be organized and addressed using a variety of techniques. Typically, the storage or memory is organized as groups of bytes, words, or some other convenient size. To make storage or memory access more efficient, the access acquires as much data as reasonable with each access, thus reducing the numbers of accesses. Access to the memory is often most efficient in terms of computation or data transfer when the access is oriented or “aligned” to boundaries such as word boundaries. However, data to be processed does not always conveniently align to boundaries. For example, the operations to be performed by a processor may be byte oriented, the amount of data in memory may align to a byte boundary but not a word boundary, and so on. Accessing specific content such as a byte can require, under certain conditions and depending on the implementation of the processor, multiple read operations. To improve computational efficiency, unaligned memory access can be required. The unaligned memory access may be needed for computational if not access efficiency. A given ISA can support explicit unaligned storage or memory accesses. The general forms of the load and store instructions for the ISA can include unaligned load instructions and unaligned store instructions. The unaligned load instructions and the unaligned store instructions support a balance or trade-off between increased density of the code that is executed by a processor and reduced processor complexity. The unaligned load instructions and the unaligned store instructions can be implemented in addition to the standard load instructions and store instructions, where the latter instructions align to boundaries such as word boundaries. When an unaligned load or store is performed, the “extra” data such as bytes that can be accessed can be held temporally for potential use by a subsequent read or store instruction (e.g., data locality).

For various reasons, execution of code can be stopped at a point in time and restarted at a later point in time, after a duration of time, and so on. The stopping and restarting of code execution can result from an exception occurring, receiving a control signal such as a fire signal or done signal, detection of an interrupt signal, and so on. In order to efficiently handle save and restore operations, an ISA can include instructions and hardware specifically tuned for the save and the store operations. A save instruction can save registers, where the registers can be stored in a stack. The saved registers can include source registers. A stack pointer can be adjusted to account for the stored registers. The saving can also include storing a local stack frame, where a stack frame can include a collection of data (or registers) on a stack that is associated with an instruction, a subprogram call, a function call, etc., that caused the save operation. The restore operation can reverse the save technique. The registers that were saved by the save operation can be restored. The restored registers can include destination registers. When the registers have been restored, the restore operation can cause a jump to a return address. Code execution can continue beginning with the return address.

As will be described in more detail below, with respect to FIGS. 6-11, the conversion unit 320 may be configured to identify the registers to be stored and generate and output the control signals which causes the execution unit to save/restore the identified registers to/from the stack via an iterative process that uses a counter. At each iteration, the conversion unit 320 identifies a register to be saved/restored and outputs one or more control signals which causes the execution unit to save that register to the stack or restore that register from the correct address of the stack. Specifically, FIG. 6 illustrates an example conversion unit, FIG. 7 illustrates an example method for generating and outputting the control signals, and FIGS. 8 to 11 illustrate examples of the loads and stores that are performed in response to different save/restore instruction arguments (e.g., Num_Reg, rStart, GP).

The execution unit 306 is configured to perform the operations identified by the control signals which are output by the decode unit 304. In some cases, the execution unit 306 may comprise one or more arithmetic logic unit (ALUs). The execution unit 306 may have one or more sub-units dedicated to performing certain operations. For example, the execution unit 306 may comprise a sub-unit for executing stores to memory and/or a sub-unit for executing loads from memory.

Reference is now made to FIG. 5, which illustrates an example method 500 for storing or restoring registers that may be implemented by the decode unit 304 of FIG. 3. The method 500 begins at block 502 where the decode unit 304 (e.g., the detection unit 318) receives an instruction to be executed by the execution unit 306. As described above, the instruction may have been provided to the decode unit 304 by a fetch unit configured to fetch instructions of a program or subroutine (in program/subroutine order) in memory 316 as indicated by a PC. Once the instruction has been received, the method 500 proceeds to block 504.

At block 504, the decode unit 304 (e.g., the detection unit 318) determines whether the received instruction is a save instruction or a restore instruction. A received save instruction or restore instruction indicates (e.g., as arguments) the total number of registers to be saved/restored (Num_Reg) and the starting register (rStart). Together Num_Reg and rStart identify the registers to be saved or restored. As described above, the decode unit 304 (e.g., detection unit 318) may be configured to identify save and/or restore instructions by a unique pattern of bits in the instruction. If the decode unit (e.g., the detection unit 318) determines that the received instruction is a save instruction or a restore instruction, the method 500 proceeds to block 506, otherwise the method 500 may end or proceed to other tasks.

At block 506, the decode unit 304 (e.g., the conversion unit 320) generates one or more control signals to cause the execution unit to save or restore Num_Reg registers in a predetermined sequence beginning with rStart to/from consecutive addresses in the stack. As described above, in the predetermined sequence each register is followed by the next highest numbered register except the highest numbered preserved register, which is followed by the lowest numbered preserved register. The method 500 may then end or proceed to other tasks.

In some cases, the method may further comprise outputting one or more control signals that cause the execution unit to update the stack pointer to point to the bottom of either the current stack frame (in the case of a save) or the previous stack frame (in the case of a restore). In these cases, the save instruction and/or restore instruction may identify the size of the current stack frame (u) and the stack pointer may be updated by adding or subtracting the size of the current stack frame. For example, where the stack is configured to grow down, the size of the current stack frame may be subtracted from the stack pointer in the case of a save and added to the stack pointer in the case of a restore.

Reference is now made to FIG. 6, which illustrates an example implementation of the conversion unit 320 of FIG. 3. In this example implementation, the conversion unit 320 is configured to output the control signals to cause the execution unit to store/load the appropriate registers to/from the stack through an iterative process, in which, in each iteration, the conversion unit 320 outputs one or more control signals that cause the execution unit to store/load one of the appropriate registers from the stack.

The example conversion unit 320 of FIG. 3 comprises control logic 602, a counter 604, register calculation logic 606, and address calculation logic 608.

The control logic 602 is configured to receive a save or restore instruction from the detection unit 318 and to generate and output control signals to cause the execution unit to store/load the appropriate registers to/from the stack. This is an iterative process, in which, in each iteration, the conversion unit 320 generates and outputs one or more control signals that cause the execution unit to store/load one of the appropriate registers to/from the stack. In other words, for each iteration, the control logic 602 generates and outputs one or more control signals to cause the execution to perform a load of a register from the stack or to perform a store of a register to the stack. Since one iteration is used for each register to be saved or restored, the number of iterations is equal to Num_Reg.

The control logic 602 uses a counter 604 to keep track of the number of registers that have been saved or restored (e.g., the number of stores (for a save instruction), or the number of loads (for a restore instruction), that have been issued to the execution unit). For each cycle (e.g., each execution cycle of the data processing apparatus 300) that the counter is not equal to Num_Reg (indicating that the loads and stores for all appropriate registers have not been issued to the execution unit), an iteration of the process is implemented. Specifically, for each cycle that the counter is not equal to Num_Reg, the control logic 602 generates and outputs one or more control signals that cause the execution unit to store/load one of the appropriate registers to/from the stack (which may be referred to as issuing a load/store instruction to the execution unit).

In some cases, generating and outputting the one or more controls signals in an iteration comprises identifying, using the register calculation logic 606, the register to be saved/restored in the current iteration; identifying, using the address calculation logic 608, the address of the stack in which the register is to be saved/restored; and outputting control signals to cause the execution to perform a load/store of the identified register to/from the identified address. At the end of the iteration, the control logic 602 increments the counter 604.

The register calculation logic 606 is configured to identify the register to be stored/loaded in a current iteration based on the counter 604 and the starting register (rStart) so that the registers that are saved or restored are the Num_Reg registers of a predetermined sequence starting with rStart, wherein, in the predetermined sequence, each register is followed by the next highest numbered register except the highest numbered preserved register, which is followed by the lowest numbered preserved register. In other words, the register calculation logic 606 calculates the register to be saved or restored in the current iteration as the Nth register in a predetermined sequence starting with rStart where N is the iteration number.

In some cases, the register calculation logic 606 may be configured to identify the register to be stored/loaded in the current iteration as rStart plus the counter if the result is equal to or less than the largest preserved register number, and the smallest preserved register number plus (rStart+counter) mod (largest preserved register number+1) otherwise. For example, where the preserved registers comprise registers 16-23, 30, and 31, the largest preserved register number is 31 and the smallest persevered register number is 16. In this example, the register calculation logic 606 may be configured to identify the register to be stored/loaded in the current iteration as rStart+counter if the result is equal to or less than 31, and 16+((rStart+counter) mod 32) otherwise.

For example, as shown in FIG. 8, when the starting register (rStart) is 31 and the number of registers (Num_Reg) is 3, there will be three iterations of the process. In the first iteration, the counter is zero, so the register to be stored/loaded is 31 (rStart+counter=31+0=31); in the second iteration, the counter is one, so the register to be stored/loaded is 16 (16+((rStart +counter) mod 32)=16+(31+1) mod 32=16+0) because rStart+counter=31 +1=32 is greater than the largest preserved register number (31); and, in the third iteration, the counter is 2, so the register to be stored/loaded is 17 (16+((rStart +counter) mod 32)=16+(31+2) mod 32=16+1) because rStart+counter=31+2=33 is greater than the largest preserved register number (31).

In some cases, this may be implemented using equation (1) set out below where “&” indicates a bitwise AND, “I” indicates a bitwise OR, and “« 4” indicates a shift left by 4 bits.

((rStart+counter) & Oxlf) I (rStart[4]«4)  (1)

For example, using the example of FIG. 8 and equation (1) the results of each iteration are shown in Table 2.

TABLE 2 rStart + (a) & rStart[4] << Iteration rStart = 31 Counter counter (a) 0x1f 4 (b) (a) | (b) Reg # 1 011111 000000 011111 011111 010000 011111 31 2 011111 000001 100000 000000 010000 010000 16 3 011111 000010 100001 000001 010000 010001 17

As described above, in some cases the save and/or restore instruction may be configured to indicate whether the GP register should be saved or restored (e.g., via the GP bit). In these cases, the register calculation logic 606 may be configured to identify, when the received save or restore instruction indicates that the GP register should be stored, in the last iteration (e.g., the iteration when counter+1=Num_Reg), the GP register as the register to be stored/loaded in that iteration.

For example, as shown in FIG. 11, when the starting register (rStart) is 31, the number of registers (Num_Reg) is 9, and the save or restore instruction indicates the GP register is to be stored, there will be nine iterations of the process. In the first iteration, rStart+counter is less than or equal to 31, so the register to be stored or loaded is calculated by rStart+counter, which results in a register number of 31. In the second to eighth iterations, rStart+counter is greater than 31, so the register to be stored or loaded is calculated by (16+((rStart+counter) mod 32), which results in register numbers 16 to 22 respectively. The ninth iteration is the last iteration and the GP bit is set, so the register number for this iteration is the GP register number (28 in this example).

The address calculation logic 608 is configured to calculate the address in memory where the register identified in the current iteration is to be stored to or restored from. Since the addresses are to be stored in consecutive addresses in the stack, the address calculation logic 608 may be configured to calculate the address in memory that the register identified in the current iteration is to be stored to/restored from as an offset from the stack pointer, wherein the offset is based on the counter.

As described above, when a subroutine is called (and thus when a save instruction is executed), the stack pointer points to the bottom of the previous stack frame and, thus, new data items can be written to the top of the new stack frame via offsets from the stack pointer. Accordingly, in some cases, when the received instruction is a save instruction, the address calculation logic 608 be configured to calculate the offset as (−(counter+1)*X) where X is the size (in bytes) of an entry of the stack. For example, where each entry in the stack is 4 bytes, then X may be equal to 4. This results in the relevant registers being stored in the first Num_Reg entries in the current stack frame (i.e., the stack frame of the callee subroutine).

For example, as shown in FIG. 9, when the starting register (rStart) is 30 and the number of registers (Num_Reg) is 10 there will be ten iterations of the process. In the first iteration, the counter is zero, so the offset is calculated as −4 (−(counter+1)*4=−(0+1)*4=−4), which causes register 30 to be stored at an offset of −4 from the stack pointer ($sp), which is the first entry of the current stack frame. In the second iteration, the counter is one, so the offset is calculated as −8 ((counter+1)*4=(1+1)*4=8), which causes register 31 to be stored at an offset of −8 from the stack pointer ($sp), which is the second entry of the current stack frame. Similarly, in the third-tenth iterations, the counter is 2-9, respectively, which results in offsets of −12, −16, −20, −24, −28, −32, −36, and −40, respectively, which causes the registers 16-23 to be stored in the third to tenth entries of the current stack frame.

In contrast, when a subroutine completes (and thus when a restore instruction is executed), the stack pointer points to the bottom of the current stack frame (i.e., the stack frame for the current subroutine). However, the registers were stored onto the stack at known offsets from the bottom of the previous stack frame. Accordingly, in some cases, the address calculation logic 608 may be configured to calculate the offset as (u−(counter+1)*X), where u is the length (e.g., in bytes) of the current stack frame, and where the instruction is a restore instruction. This effectively moves the stack pointer back to the bottom of the previous stack frame and then calculates an offset from that address. In some cases, the frame length may be specified in the restore instruction as an argument.

For example, as shown in FIG. 10, when the starting register (rStart) is 15, the number of registers (Num_Reg) is 3, and the frame length is 44 bytes (i.e., 11 words), there will be three iterations of the process. In the first iteration, the counter is zero, so the offset is calculated as 40 ((u−(counter+1)*4)=44−(0+1)*4=40), which is the first entry of the current stack frame. In the second iteration, the counter is one, so the offset is calculated as 36 ((u−(counter+1)*4)=44−(1+1)*4=36), which is the second entry of the current stack frame. In the third iteration, the counter is two, so the offset is calculated as 32 ((u−(counter+1)*4)=44−(2+1)*4=32).

Although the conversion unit 320 of FIG. 6 has been described as performing one iteration of the process per cycle of the data processing apparatus, in other examples the conversion unit 320 may be configured to perform more than one iteration per cycle. The conversion unit 320 may be able to execute more than one iteration per cycle by sequentially executing multiple iterations in a cycle or by performing multiple iterations in parallel in cycle.

As described above, the control signals that are generated by the conversion unit (e.g., control logic) may vary between data processing apparatuses. For example, in some cases, the control signals for an iteration may comprise information indicating the register (e.g., the register number) to be saved and restored and the completed address of the stack where the register is to be saved to or restored from. However, in other cases, the control signals for an iteration may comprise information indicating the register (e.g., register number) to be saved and restored and information indicating the address of the stack where the register is to be saved (e.g., the offset, or the counter number), then the execution unit will calculate the complete address.

Reference is now made to FIG. 7, which illustrates an example method 700 of converting a save or restore instruction into none, one, or more than one store (for a save instruction) or load (for a restore instruction) and causing an execution unit (e.g., execution unit 306) to execute the none, one, or more than one store/load, which may be implemented, for example, by the conversion unit 320 of FIGS. 3 and 6. In some examples, block 506 of method 500 of FIG. 5 may be implemented using method 700 of FIG. 7

The method 700 begins at block 702, where the conversion unit 320 (e.g., control logic 602) receives a save instruction or a restore instruction. The received save instruction or restore instruction indicates (e.g., as arguments) the total number of registers to be saved/restored (Num_Reg) and the starting register (rStart). Together, Num_Reg and rStart identify the registers to be saved or restored. In some cases, the received save/restore instruction may also indicate whether the GP register is to be saved/restored. As described above, in some examples the save instruction or restore instruction may be provided to the conversion unit 320 by a detection unit 318 in response to the detection unit 318 identifying a received instruction as a save instruction or a restore instruction. Once the conversion unit 320 has received a save instruction or a restore instruction, the method 700 proceeds to block 704.

At block 704, the conversion unit 320 (e.g., control logic 602) sets a counter (e.g., counter 604) used to track the number of registers that have been saved/restored (e.g., the number of load/store instructions that have been issued to the execution unit) to zero to indicate that no registers have been saved or restored (e.g., no load/stores have been issued to the execution unit). The method 700 then proceeds to block 706 where the conversion unit 320 (e.g., control logic 602) determines whether the desired number of registers have been saved or restored. The conversion unit 320 (e.g., control logic 602) may determine that the desired number of registers have been saved or restored if the counter is equal to Num_Reg. If it is determined at block 706 that the desired number of registers have been saved or restored, then the method 700 may end or may proceed to block 720. If, however, it is determined at block 706 that the desired number of registers have not been saved or restored, then there is at least one register that still needs to be saved or restored, and so the method 700 proceeds to block 708.

At block 708, the conversion unit 320 (e.g., address calculation logic 608) calculates the address in the stack where the next register is to be saved to, or restored from, based on the counter. As described above, the conversion unit may be configured to calculate the address in the stack where the next register is to be saved to, or restored from, based on the stack pointer and an offset where the offset is based on the counter. In some cases, the conversion unit may be configured to calculate the offset to be −(counter+1)*X where X is the size (in bytes) of each entry in the stack when the received instruction is a save instruction, and to calculate the offset to be (u−(counter+1)*X), where u is the size of the current stack frame, when the received instruction is a restore instruction. Once the address has been calculated, the method 700 may proceed to block 710 or block 712, depending on whether the save and restore instructions are configured to include information indicating whether the GP register is to be saved or restored. Specifically, if the save and restore instructions are configured to include information indicating whether the GP register is to be saved or restored, the method 700 proceeds to block 710; otherwise, the method 700 proceeds directly to block 714.

At block 710, the conversion unit 320 (e.g., register calculation logic 606) determines (i) whether the GP register is to be saved or restored (e.g., the GP flag is set in the received instruction) and (ii) whether this is the last register to be saved or restored. In some cases, the conversion unit 320 (e.g., register calculation logic) may be configured to determine that this is the last register to be saved or restored if the counter+1 is equal to Num_Reg. If it is determined that the GP register is to be saved, and this is the last register to be saved, then the method 700 proceeds to block 714, where the next register to be stored is identified as the GP register (e.g., register 28 in some ISAs). The method then proceeds to block 716. If, however, it is determined that the GP register is not to be saved or restored, or that this is not the last register to be saved or restored, then the method 700 proceeds to block 712.

At block 712, the conversion unit 320 (e.g., register calculation logic 606) calculates or identifies the number of the next register to be saved or restored based on the identified starting register (rStart) and the counter so that the registers that are stored are the Num_Reg registers of a predetermined sequence starting with rStart, wherein, in the predetermined sequence, each register is followed by the next highest numbered register except the highest numbered preserved register, which is followed by the lowest numbered preserved register. In other words, the conversion unit 320 (e.g., register calculation logic 606) calculates the next register to be saved or restored as the Nth register in a predetermined sequence starting with rStart where N is equal to the counter+1.

In some cases, the conversion unit 320 (e.g., register calculation logic 606) may be configured to identify the next register to be stored/loaded as (a) rStart plus the counter if the result is equal to or less than the largest preserved register number or (b) the smallest preserved register number+((rStart+offset) mod (largest preserved register number+1)) otherwise. For example, where the preserved registers comprise registers 16-23, 30, and 31, the largest preserved register number is 31 and the smallest preserved register number is 16. In this example, the conversion unit 320 (e.g., register calculation logic 606) may be configured to identify the register to be stored/loaded in the current iteration to be rStart+counter if the result is equal to or less than 31, and 16+((rStart+counter) mod 32) otherwise. Once the number of the next register has been identified or calculated, the method 700 proceeds to block 716.

At block 716, the conversion unit 320 outputs one or more control signals to cause the execution unit to perform a load of the identified register from the identified/calculated address (for a restore instruction) or a store of the identified register to the identified/calculated address (for a save instruction). The method 700 then proceeds to block 718, where the conversion unit 320 (e.g., control logic) increments the counter to indicate that one more register has been saved/restored. The method 700 then proceeds back to block 706 to determine if all of the registers have been saved/restored.

At block 720, the conversion unit outputs one or more control signals that cause the execution unit to update the stack pointer to either point to the bottom of the current stack frame (in the case of a save) or to point to the bottom of the previous stack frame (in the case of a restore). Where the stack is configured to grow down, the size of the current stack frame (u) may be subtracted from the stack pointer in the case of a save and added to the stack pointer in the case of a restore.

The method 700 can be described as an iterative method where one iteration comprises execution of blocks 706, 708, 712/714, 716 and 718.

Reference is now made to FIGS. 12 and 13, which illustrate example formats for save and restore instructions, which may be received by the decode unit 304 described herein. An ISA may comprise any combination of these instructions. Specifically, FIG. 12 illustrates example formats for save instructions and FIG. 13 illustrates example formats for restore instructions. Binary values in these Figs. represent specific bit patterns, the instruction for which should have been decoded by the decode unit 304. The remaining fields are named instruction arguments. Also, fields with names ending in square parentheses such as “s[7:0]” and “s[0]” specify a particular range of bits for the named argument using a Verilog style syntax. A single argument value may be split into more than one field in the instruction encoding, with the bit ranges specified by each field explicitly in this way. All non-specified bits will be set to zero (e.g., if s[32] is not specified, then bit 32 will be set to zero). If no explicit bit range is specified, then the argument represents the least significant bits of the value.

FIG. 12 illustrates an example 16-bit format of a save instruction (SAVE [16]) and an example 32-bit format of a save instruction (SAVE [32]), which cause the Num_Reg registers in a predetermined sequence starting with rStart to be saved to the stack. In the SAVE [16] instruction, seven bits (bits 8 and 10-15) are used to identify the instruction as a SAVE [16] instruction, four bits (bits 0 to 3) are used to identify the number of registers (up to 15) to be saved, and four bits (bits 4 to 7) are used to identify the size of the stack frame. Specifically, bits 4 to 7 are used to specify bits 7 to 4 of the size wherein bits 3 to 0 of the size are set to zero, which limits the stack frame size to a multiple of 16 bytes. Finally, a single bit (bit 9) is used to identify the starting register. If the bit is set, then rStart is register 31; otherwise rStart is register 30.

In the SAVE [32] instruction, 13 bits (bits 0-1, 12-15, 20 and 26-31) are used to identify the instruction as a SAVE [32] instruction, four bits (bits 16 to 19) are used to identify the number of registers (Num_Reg) (up to 15) to be saved, and nine bits (bits 3 to 11) are used to identify the size of the stack frame (u). Specifically, bits 3 to 11 are used to specify bits 11 to 3 of the size wherein bits 2 to 0 of the size are set to zero, which limits the stack frame size to a multiple of 8 bytes. Five bits (bits 21 to 25) are used to explicitly identify the starting register (rStart) and a single bit (bit 2) is used to indicate whether the GP register is to be saved to the stack. When GP (bit 2) is set, the GP register is to be saved to the stack, and when GP is not set, the GP register is not to be saved to the stack.

FIG. 13 illustrates an example 16-bit format of a restore instruction (RESTORE.JRC [16]) and two examples of a 32-bit format of a restore instruction (RESTORE.JRC [32] and RESTORE [32]), which cause the Num_Reg registers in a predetermined sequence starting with rStart to be restored from the stack. In the RESTORE.JRC [16] instruction, seven bits (bits 8 and 10-15) are used to identify the instruction as a RESTORE.JRC [16] instruction, four bits (bits 0 to 3) are used to identify the number of registers (up to 15) to be saved, and four bits (bits 4 to 7) are used to identify the size of the stack frame (u). Specifically, bits 4 to 7 are used to specify bits 7 to 4 of the size wherein bits 3 to 0 of the size are set to zero, which limits the stack frame size to a multiple of 16 bytes. Finally, a single bit (bit 9) is used to identify the starting register (rStart). If this bit is set, then rStart is register 31 otherwise rStart is register 30.

In the RESTORE.JRC [32] and RESTORE [32] instructions, 13 bits (bits 0-1, 12-15, 20 and 26-31) are used to identify the instruction as a RESTORE.JRC [32] instruction or a RESTORE [32] instruction, four bits (bits 16 to 19) are used to identify the number of registers (Num_Reg) (up to 15) to be restored, and nine bits (bits 3 to 11) are used to identify the size of the stack frame (u). Specifically, bits 3 to 11 are used to specify bits 11 to 3 of the stack frame size, wherein bits 2 to 0 of the size are set to zero, which limits the stack frame size to a multiple of 8 bytes. Five bits (bits 21 to 25) are used to explicitly identify the starting register (rStart), and a single bit (GP, bit 2) is used to indicate whether the global register is to be saved to the stack. When GP (bit 2) is set, the GP register is to be saved to the stack, and when GP is not set, the GP register is not to be saved to the stack.

In some cases, the instruction set of the ISA implemented by the data processing apparatus 300 may further comprise a load word multiple (LWM) instruction and/or a store word multiple (SWM) instruction. A store word multiple instruction causes multiple words to be loaded from consecutive addresses in memory and stored in a series of registers, and a load word multiple instruction causes multiple words in a series of registers to be stored in consecutive addresses in memory. Store/load word multiple instructions, such as save and restore instructions, are macro instructions, which are converted by the decode unit into one or more micro instructions (load/store instructions). LWM/SWM instructions differ from save and restore instructions in that they are intended to store data from/or load data to a general set of registers to/from a general memory address, rather than specifically to store data from/or restore data to preserved registers to/from addresses in the current subroutine's stack frame.

Where store/load word multiple instructions specify the number of words (Num_Words) to be stored or loaded, the starting register (rStart) to which the data is to be stored from or loaded to, and a base register to which the data is to be stored to or loaded from, the registers to be read from/written to can be determined in the same manner as described above for identifying the registers to be save/restored; and/or the addresses to be written to/read from can be determined in the same manner as the addresses of the stack to be saved to/restored from. Specifically, the registers from which the data is to be stored from or loaded to may be identified as the Num_Words registers in a predetermined sequence (e.g., sequence 404) beginning with the starting register (rStart), wherein, in the predetermined sequence, each register is followed by the next highest numbered register except the highest numbered preserved register, which is followed by the lowest numbered preserved register. Similarly, the addresses in which the data is to be stored to, or loaded from, may be determined from the base register and an offset, which is based on a counter value as described above. This allows the same conversion unit used to convert a received save instruction or restore instruction into none, one, or more than one load and/or store, to convert a received load/store word multiple instruction into none, one, or more than one load and/or store, and to output one or more control signals which causes the execution unit to execute the load(s) or store(s) so as to store or load a plurality of words to/from memory.

Accordingly, in these cases, the decode unit 304 may be further configured to determine whether the received instruction is a load word multiple instruction or a store word multiple instruction (specifying a number of words to store/load and a starting register), and, in response to determining that the received instruction is a load word multiple instruction or a store word multiple instruction, forward the received instruction to the conversion unit, which causes the conversion unit to output one or more control signals to load or store data to/from the number of words registers in the predetermined order beginning with the starting register.

The data processing apparatus and conversion unit of FIGS. 3 and 6 are shown as comprising a number of functional blocks. This is schematic only and is not intended to define a strict division between different logic elements of such entities. Each functional block may be provided in any suitable manner. It is to be understood that intermediate values described herein as being formed by the data processing apparatus or the conversion unit need not be physically generated by the data processing apparatus or the conversion unit at any point and may merely represent logical values, which conveniently describe the processing performed by the data processing apparatus or the conversion unit between its input and output.

The data processing apparatuses described herein may be embodied in hardware on an integrated circuit. The data processing apparatus described herein may be configured to perform any of the methods described herein. Generally, any of the functions, methods, techniques, or components described above can be implemented in software, firmware, hardware (e.g., fixed logic circuitry), or any combination thereof. The terms “module,” “functionality,” “component”, “element”, “unit”, “block”, and “logic” may be used herein to generally represent software, firmware, hardware, or any combination thereof. In the case of a software implementation, the module, functionality, component, element, unit, block, or logic represents program code that performs the specified tasks when executed on a processor. The algorithms and methods described herein could be performed by one or more processors executing code that causes the processor(s) to perform the algorithms/methods. Examples of a computer-readable storage medium include a random-access memory (RAM), read-only memory (ROM), an optical disc, flash memory, hard disk memory, and other memory devices that may use magnetic, optical, and other techniques to store instructions or other data and that can be accessed by a machine.

The terms computer program code and computer readable instructions as used herein refer to any kind of executable code for processors, including code expressed in a machine language, an interpreted language, or a scripting language. Executable code includes binary code, machine code, bytecode, code defining an integrated circuit (such as a hardware description language or netlist), and code expressed in a programming language code such as C, Java, or OpenCL. Executable code may be, for example, any kind of software, firmware, script, module, or library, which, when suitably executed, processed, interpreted, compiled, and executed at a virtual machine or other software environment, causes a processor of the computer system at which the executable code is supported to perform the tasks specified by the code.

It is also intended to encompass software, which defines a configuration of hardware as described herein, such as HDL (hardware description language) software, as is used for designing integrated circuits, or for configuring programmable chips, to carry out desired functions. That is, there may be provided a computer readable storage medium having encoded thereon computer readable program code in the form of an integrated circuit definition dataset that when processed (i.e., run) in an integrated circuit manufacturing system configures the system to manufacture a data processing apparatus configured to perform any of the methods described herein, or to manufacture a data processing apparatus comprising any apparatus described herein. An integrated circuit definition dataset may be, for example, an integrated circuit description.

Therefore, there may be provided a method of manufacturing, at an integrated circuit manufacturing system, a data processing apparatus as described herein. Furthermore, there may be provided an integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, causes the method of manufacturing a data processing apparatus to be performed.

An integrated circuit definition dataset may be in the form of computer code, for example as a netlist, as code for configuring a programmable chip, as a hardware description language defining hardware suitable for manufacture in an integrated circuit at any level, including as register transfer level (RTL) code, as high-level circuit representations such as Verilog or VHDL, and as low-level circuit representations such as OASIS® and GDSII. Higher level representations, which logically define hardware suitable for manufacture in an integrated circuit (such as RTL), may be processed at a computer system configured for generating a manufacturing definition of an integrated circuit in the context of a software environment comprising definitions of circuit elements and rules for combining those elements in order to generate the manufacturing definition of an integrated circuit so defined by the representation. As is typically the case with software executing at a computer system so as to define a machine, one or more intermediate user steps (e.g., providing commands, variables etc.) may be required in order for a computer system configured for generating a manufacturing definition of an integrated circuit to execute code defining an integrated circuit so as to generate the manufacturing definition of that integrated circuit.

An example of processing an integrated circuit definition dataset at an integrated circuit manufacturing system so as to configure the system to manufacture a data processing apparatus will now be described with respect to FIG. 14.

FIG. 14 shows an example of an integrated circuit (IC) manufacturing system 1402, which is configured to manufacture a data processing apparatus as described in any of the examples herein. In particular, the IC manufacturing system 1402 comprises a layout processing system 1404 and an integrated circuit generation system 1406. The IC manufacturing system 1402 is configured to receive an IC definition dataset (e.g., defining a data processing apparatus as described in any of the examples herein), process the IC definition dataset, and generate an IC according to the IC definition dataset (e.g., which embodies a data processing apparatus as described in any of the examples herein). The processing of the IC definition dataset configures the IC manufacturing system 1402 to manufacture an integrated circuit embodying a data processing apparatus as described in any of the examples herein.

The layout processing system 1404 is configured to receive and process the IC definition dataset to determine a circuit layout. Methods of determining a circuit layout from an IC definition dataset are known in the art, and for example may involve synthesizing RTL code to determine a gate level representation of a circuit to be generated, e.g., in terms of logical components (e.g., NAND, NOR, AND, OR, MUX, and FLIP-FLOP components). A circuit layout can be determined from the gate level representation of the circuit by determining positional information for the logical components. This may be done automatically or with user involvement in order to optimize the circuit layout. When the layout processing system 1404 has determined the circuit layout, it may output a circuit layout definition to the IC generation system 1406. A circuit layout definition may be, for example, a circuit layout description.

The IC generation system 1406 generates an IC according to the circuit layout definition, as is known in the art. For example, the IC generation system 1406 may implement a semiconductor device fabrication process to generate the IC, which may involve a multiple-step sequence of photo lithographic and chemical processing steps during which electronic circuits are gradually created on a wafer made of semiconducting material. The circuit layout definition may be in the form of a mask, which can be used in a lithographic process for generating an IC according to the circuit definition. Alternatively, the circuit layout definition provided to the IC generation system 1406 may be in the form of computer readable code, which the IC generation system 1406 can use to form a suitable mask for use in generating an IC.

The different processes performed by the IC manufacturing system 1402 may be implemented all in one location, e.g., by one party. Alternatively, the IC manufacturing system 1402 may be a distributed system such that some of the processes may be performed at different locations and may be performed by different parties. For example, some of the stages of: (i) synthesizing RTL code representing the IC definition dataset to form a gate level representation of a circuit to be generated, (ii) generating a circuit layout based on the gate level representation, (iii) forming a mask in accordance with the circuit layout, and (iv) fabricating an integrated circuit using the mask, may be performed in different locations and/or by different parties.

In other examples, processing of the integrated circuit definition dataset at an integrated circuit manufacturing system may configure the system to manufacture a data processing apparatus without the IC definition dataset being processed so as to determine a circuit layout. For instance, an integrated circuit definition dataset may define the configuration of a reconfigurable processor, such as an FPGA, and the processing of that dataset may configure an IC manufacturing system to generate a reconfigurable processor having that defined configuration (e.g., by loading configuration data to the FPGA).

In some embodiments, an integrated circuit manufacturing definition dataset, when processed in an integrated circuit manufacturing system, may cause an integrated circuit manufacturing system to generate a device as described herein. For example, the configuration of an integrated circuit manufacturing system in the manner described above with respect to FIG. 14 by an integrated circuit manufacturing definition dataset may cause a device as described herein to be manufactured.

In some examples, an integrated circuit definition dataset could include software that runs on hardware defined at the dataset or in combination with hardware defined at the dataset. In the example shown in FIG. 14, the IC generation system may further be configured by an integrated circuit definition dataset to, on manufacturing an integrated circuit, load firmware onto that integrated circuit in accordance with program code defined at the integrated circuit definition dataset or otherwise provide program code with the integrated circuit for use with the integrated circuit.

FIG. 15 is a flow diagram for saving registers to a stack associated with a data processing apparatus. The data processing apparatus includes a plurality of numbered registers, the plurality of numbered registers includes a plurality of preserved registers, and the plurality of preserved registers includes two or more non-contiguous blocks of registers. In embodiments, the plurality of preserved registers comprises one or more of: a saved register, a frame pointer register, and a return address register. The saving of registers to a stack can include saving non-contiguous blocks of preserved registers. The flow 1500 includes receiving an instruction for execution 1510 by an execution unit. The execution unit can be a unit of the data processing apparatus, a computer, and so on. The execution unit of the data processing apparatus can be configured within the data processing apparatus, where the data processing apparatus can include a reconfigurable fabric. The data processing apparatus can be based on a variety of architectures such as control flow or data flow architectures, etc. In embodiments, the architecture of the data processing apparatus can include an ISA, where the ISA can include a CISC, a RISC, and the like. The instruction that is received can include a data transfer instruction, a data manipulation instruction, a program control instruction, etc. The instruction may operate on a variety of data types, where the data types can include bits, bytes, integers, reals, floats, characters, strings, words, double words, etc.

The flow 1500 includes determining that the received instruction is a save instruction 1520. The save instruction can be used to save or store blocks of preserved registers or can be used to save or store non-contiguous blocks of preserved registers. The save instruction can include identifying a number of registers 1522 and a starting register. The number of registers to be stored can be based on the instruction that was received, the data type of the data that can be stored, the contents of the registers, and so on. In embodiments, the starting register is not one of the plurality of preserved registers. The starting register may not be one of the preserved registers when the starting register is not modified by a callee, a subroutine, and so on. In other embodiments, the starting register is one of the plurality of preserved registers. The starting register may be preserved when there is a chance or likelihood that the starting register can be manipulated of changed by the callee. The save instruction can include information regarding the save instruction, how the save instruction can interact with data or with registers, and so on. For a given save instruction, the registers to be saved can be either in sequence or out of sequence. In embodiments, the save instruction further indicates whether an out-of-sequence register is to be saved. The flow 1500 includes, in response to determining that the received save instruction indicates that an out-of-sequence register is to be saved, replacing a final register 1524 in the predetermined order with the out-of-sequence register. The out-of-sequence register can be a data register, an address register, an instruction counter, a PC, and the like. In embodiments, the out-of-sequence register is a GP register. The flow 1500 further includes determining whether the received instruction is a restore instruction 1530. The restore instruction can be used to restore contiguous blocks of preserved registers or can be used to restore non-contiguous blocks of preserved registers. The restore instruction can identify a number of registers and a starting register.

In response to determining the received instruction is a save instruction, the flow 1500 includes outputting one or more control signals 1540 to cause the execution unit to store 1542 to a stack the identified number of registers in a predetermined order beginning with the starting register. The registers that can be stored to the stack can include data registers, pointer registers, state registers such as a PC or an instruction counter, and so on. The store to the stack is executed, where in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register. The next highest numbered register can be a next contiguous register or can be a non-contiguous register. The outputting of control signals for storing of the registers can be iterative. In embodiments, outputting one or more control signals to cause the execution unit to store to the stack the identified number of registers in a predetermined order beginning with the starting register can include outputting, for each iteration 1544 of a number of iterations, one or more control signals to cause the execution unit to store one register at an address of the stack. The number of iterations of outputting control signals can be equal to the identified number of registers.

The control signals can be output for various instructions in addition to save or store instructions. In embodiments, in response to determining the received instruction is a restore instruction, the flow includes outputting one or more control signals 1540 to cause the execution unit to restore 1546 from a stack the number of registers in a predetermined order beginning with the starting register. The restoring can be performed based on various data restoration techniques. The restoration techniques can include sequential or contiguous restoration from the stack to the registers, non-contiguous restoration, and the like. The restoring registers from the stack can be executed, where in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register. As for the store instruction, the registers of the restore instruction can be contiguous or non-contiguous.

The flow 1500 further includes tracking a number of registers that have been stored using a counter 1550. The number of registers that can be stored can vary based on the type of store instruction that is executed, the data type or types of the data to be stored, what registers may be manipulated by a callee such as a subroutine, and so on. Likewise, the number of registers that can be restored can vary for similar reasons. The counter can be used to keep track of how many registers need to be stored before the callee executes, since the callee may manipulate the registers. As discussed above, the storing of registers to the stack can be iterative. The flow 1500 further includes identifying the register 1552 to be stored in the stack in a particular iteration based on the starting register and the counter. The identifying of the register to be stored can include determining a number of the register. In the flow 1500, the identifying the register to be stored in the stack in a particular iteration includes identifying a number of the register 1554. Each register may have a number, where the number of the register can include an address, a relative address, an offset, a reference, or the like. In embodiments, the number of the register to be stored in the stack in a particular iteration is identified as a sum of a number of the starting register and the counter when the sum is less than the highest preserved register number plus one. Other arithmetic operations including subtraction can be performed to identify the number of the register. In other embodiments, the number of the register to be stored in the stack in a particular iteration is identified as a sum of a lowest preserved register number plus a remainder of the sum of the number of the starting register and the counter divided by the highest preserved register number plus one, otherwise. The flow 1500 further includes identifying the address of the stack 1556 for a particular iteration based on the counter. Recall that the counter value can be based on the type of store instruction executed, the data type of the data within registers, and so on. The address of the stack for a particular iteration can be based on a stack pointer and an offset based on the counter. Addresses for stacks for consecutive iterations may be contiguous or non-contiguous. In embodiments, the offset can be a sum of the counter and one multiplied by a size of an entry in the stack. Various steps in the flow 1500 may be changed in order, repeated, omitted, or the like without departing from the disclosed concepts.

FIG. 16 is a flow diagram for restoring registers from a stack associated with a data processing apparatus. The data processing apparatus can be the same data processing apparatus associated with storing registers or can be a different data processing apparatus. The data processing apparatus includes a plurality of numbered registers, the plurality of numbered registers includes a plurality of preserved registers, and the plurality of preserved registers includes two or more non-contiguous blocks of registers. In embodiments, the plurality of preserved registers comprises one or more of: a saved register, a frame pointer register, and a return address register. The restoring registers from a stack can include restoring non-contiguous blocks of preserved registers. The flow 1600 includes receiving an instruction for execution 1610 by an execution unit. The execution unit can be a unit of the data processing apparatus, a computer, and so on. The instruction that is received can include a data transfer instruction, a data manipulation instruction, a program control instruction, and the like. The instruction may operate on a variety of data types, where the data types can include bits, bytes, integers, reals, floats, characters, strings, words, double words, etc.

The flow 1600 includes determining that the received instruction is a restore instruction 1620. The restore instruction can be used to restore blocks of preserved registers, non-contiguous blocks of preserved registers, and the like. The restore instruction can include identifying a number of registers 1622 and a starting register. As for a save instruction described elsewhere, the number of registers to be stored can be based on the instruction that was received, on the data type of the data that can be restored, the contents of stack, and so on. In embodiments, the starting register is not one of the plurality of preserved registers. When the starting register was not modified by a callee, a subroutine, a function, and so on, the starting register may not be one of the preserved registers. In other embodiments, the starting register is one of the plurality of preserved registers. The starting register may be preserved when there is a chance or likelihood that the starting register was manipulated or changed by the callee. For a given restore instruction, the registers to be restored can be either in sequence or out of sequence. In embodiments, the restore instruction further indicates whether an out-of-sequence register is to be restored. The flow 1600 includes, in response to determining that the received restore instruction indicates that an out-of-sequence register is to be restored, replacing a final register 1624 in the predetermined order with the out-of-sequence register. The out-of-sequence register can be a data register, an address register, an instruction counter, a PC, and the like. In embodiments, the out-of-sequence register is a GP register.

In response to determining the received instruction is a restore instruction, the flow 1600 includes outputting one or more control signals 1630 to cause the execution unit to restore 1632 from a stack the identified number of registers in a predetermined order beginning with the starting register 1634. The registers that can be restored from the stack can include data registers, pointer registers, state registers such as a PC or an instruction counter, and so on. The restore from the stack is executed by the data processing apparatus, where in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register. The next highest numbered register can be a next contiguous register or can be a non-contiguous register. The outputting of control signals for restoring of the registers from the stack can be iterative. In embodiments, outputting one or more control signals to cause the execution unit to restore from the stack the identified number of registers in a predetermined order beginning with the starting register comprises outputting, for each iteration 1636 of a number of iterations, one or more control signals to cause the execution unit to store one register at an address of the stack. The number of iterations of outputting control signals can be equal to the identified number of registers.

The flow 1600 further includes tracking a number of registers that have been restored using a counter 1640. The number of registers that can be restored can vary based on the type of restore instruction that is executed, the data type or types of the data to be restored, and so on. The counter can be used to keep track of how many registers need to be restored after the callee executes, since the callee may have manipulate the registers. As discussed above, the restoring of registers from the stack can be iterative. The flow 1600 further includes identifying the register 1642 to be restored from the stack in a particular iteration based on the starting register and the counter. The identifying of the register to be restored can include determining a number of the register. In the flow 1600 the identifying the register to be restored from the stack in a particular iteration comprises identifying a number 1644 of the register to be restored from the stack in a particular iteration. Each register may have a number, where the number of the register can include an address, a relative address, an offset, a reference, or the like. In embodiments, the number of the register to be restored from the stack in a particular iteration is a sum of a number of the starting register and the counter when the sum is less than the highest preserved register number plus one. Other arithmetic operations including subtraction can be performed to identify the number of the register. In other embodiments, the number of the register to be restored from the stack in a particular iteration is identified as a sum of a lowest preserved register number plus a remainder of the sum of the number of the starting register and the counter divided by the highest preserved register number plus one, otherwise.

The flow 1600 further includes identifying the address of the stack 1646 for a particular iteration based on the counter. Recall that the counter value can be based on the type of restore instruction executed, the data type of the data within registers, and so on. The address of the stack for a particular iteration can be based on a stack pointer and an offset based on the counter. Addresses of stacks for consecutive iterations may be contiguous or non-contiguous. In embodiments, the offset can be as a sum of a size of a current stack frame, and the sum of the counter and one multiplied by a size of an entry in the stack. Various steps in the flow 1600 may be changed in order, repeated, omitted, or the like without departing from the disclosed concepts.

FIG. 17 is a system diagram for saving registers to a stack or restoring registers from a stack. The stack can be associated with a data processing apparatus that includes a plurality of numbered registers, the plurality of numbered registers including a plurality of preserved registers, and the plurality of preserved registers including two or more non-contiguous blocks of registers. The system 1700 can include one or more processors 1710 coupled to a memory 1712, which stores instructions. The system 1700 can include a display 1714 coupled to the one or more processors 1710 for displaying data, intermediate steps, instructions, PCs, instruction counters, control signals, pointer values, stack values, stack frame values, and so on. In embodiments, one or more processors 1710 are attached to the memory 1712 where the one or more processors, when executing the instructions, which are stored, are configured to: provide access to a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; receive an instruction for execution by an execution unit; determine whether the received instruction is a save instruction, the save instruction identifying a number of registers and a starting register within the plurality of numbered registers; and output one or more control signals to cause the execution unit to store to a stack the identified number of registers in a predetermined order beginning with the starting register when the determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register. The system 1700 can include a collection of instructions and data 1720. The instructions and data 1720 may be stored in a database, one or more statically linked libraries, one or more dynamically linked libraries, precompiled headers, source code, or other suitable formats. The instructions can include instructions for saving and restoring non-contiguous blocks of preserved registers. The processing apparatus may be realized using processing elements within a reconfigurable fabric.

In other embodiments, one or more processors 1710 are attached to the memory 1712 where the one or more processors, when executing the instructions which are stored, are configured to: provide access to a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; receive an instruction for execution by an execution unit; determine whether the received instruction is a restore instruction, the restore instruction identifying a number of registers and a starting register within the plurality of numbered registers; and output one or more control signals to cause the execution unit to restore from a stack the identified number of registers in a predetermined order beginning with the starting register when the determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register.

The system 1700 can include a receiving component 1730. The receiving component can include functions and instructions for receiving an instruction for execution by an execution unit. The received instruction can be any of a variety of instructions, where the variety of instructions can include data transfer instructions, data manipulation instructions, program control instructions, and so on. The instructions can include instructions for processing data, analyzing data, comparing data, and so on. The received instruction can be a save or store instruction, a write or restore instruction, etc. The system 1700 can include a determining component 1740. The determining component can include functions and instructions for determining whether the received instruction is a save instruction, the save instruction identifying a number of registers and a starting register. The identified number of registers and the starting register can be used for saving or storing data to a stack, where the data can include bytes, words, double words, and so on. The data can include register contents of the data processing apparatus. The determining component can include functions and instructions for determining whether the received instruction is a restore instruction, the restore instruction identifying a number of registers and a starting register. The identified number of registers and the starting register can be used to restore register values or data from a stack to the data processing apparatus.

The system 1700 can include an outputting component 1750. The outputting component can include functions and instructions for outputting one or more control signals to cause the execution unit to store to a stack the identified number of registers in a predetermined order beginning with the starting register, wherein in the predetermined order each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register. Other control signals can be output. The outputting component can include functions and instructions for outputting one or more control signals to cause the execution unit to restore from a stack the identified number of registers in a predetermined order beginning with the starting register, wherein in the predetermined order each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register.

The system 1700 can include a computer program product embodied in a non-transitory computer readable medium for saving registers to a stack, the computer program product comprising code which causes one or more processors to perform operations of: providing access to a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; receiving an instruction for execution by an execution unit; determining whether the received instruction is a save instruction, the save instruction identifying a number of registers and a starting register within the plurality of numbered registers; and outputting one or more control signals to cause the execution unit to store to a stack the identified number of registers in a predetermined order beginning with the starting register when the determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register.

The system 1700 can include a computer program product embodied in a non-transitory computer readable medium for restoring registers from a stack, the computer program product comprising code which causes one or more processors to perform operations of: providing access to a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; receiving an instruction for execution by an execution unit; determining whether the received instruction is a restore instruction, the restore instruction identifying a number of registers and a starting register within the plurality of numbered registers; and outputting one or more control signals to cause the execution unit to restore from a stack the identified number of registers in a predetermined order beginning with the starting register when the determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register.

Each of the above methods may be executed on one or more processors on one or more computer systems. Embodiments may include various forms of distributed computing, client/server computing, and cloud-based computing. Further, it will be understood that the depicted steps or boxes contained in this disclosure's flow charts are solely illustrative and explanatory. The steps may be modified, omitted, repeated, or re-ordered without departing from the scope of this disclosure. Further, each step may contain one or more sub-steps. While the foregoing drawings and description set forth functional aspects of the disclosed systems, no particular implementation or arrangement of software and/or hardware should be inferred from these descriptions unless explicitly stated or otherwise clear from the context. All such arrangements of software and/or hardware are intended to fall within the scope of this disclosure.

The block diagrams and flowchart illustrations depict methods, apparatus, systems, and computer program products. The elements and combinations of elements in the block diagrams and flow diagrams, show functions, steps, or groups of steps of the methods, apparatus, systems, computer program products and/or computer-implemented methods. Any and all such functions—generally referred to herein as a “circuit,” “module,” or “system”—may be implemented by computer program instructions, by special-purpose hardware-based computer systems, by combinations of special purpose hardware and computer instructions, by combinations of general-purpose hardware and computer instructions, and so on.

A programmable apparatus that executes any of the above-mentioned computer program products or computer-implemented methods may include one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors, programmable devices, programmable gate arrays, programmable array logic, memory devices, application specific integrated circuits, or the like. Each may be suitably employed or configured to process computer program instructions, execute computer logic, store computer data, and so on.

It will be understood that a computer may include a computer program product from a computer-readable storage medium and that this medium may be internal or external, removable and replaceable, or fixed. In addition, a computer may include a Basic Input/Output System (BIOS), firmware, an operating system, a database, or the like that may include, interface with, or support the software and hardware described herein.

Embodiments of the present invention are limited to neither conventional computer applications nor the programmable apparatus that run them. To illustrate: the embodiments of the presently claimed invention could include an optical computer, quantum computer, analog computer, or the like. A computer program may be loaded onto a computer to produce a particular machine that may perform any and all of the depicted functions. This particular machine provides a means for carrying out any and all of the depicted functions.

Any combination of one or more computer readable media may be utilized including but not limited to: a non-transitory computer readable medium for storage; an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor computer readable storage medium or any suitable combination of the foregoing; a portable computer diskette; a hard disk; a random access memory (RAM); a read-only memory (ROM), an erasable programmable read-only memory (EPROM, Flash, MRAM, FeRAM, or phase change memory); an optical fiber; a portable compact disc; an optical storage device; a magnetic storage device; or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.

It will be appreciated that computer program instructions may include computer executable code. A variety of languages for expressing computer program instructions may include without limitation C, C++, Java, JavaScript™, ActionScript™, assembly language, Lisp, Perl, Tcl, Python, Ruby, hardware description languages, database programming languages, functional programming languages, imperative programming languages, and so on. In embodiments, computer program instructions may be stored, compiled, or interpreted to run on a computer, a programmable data processing apparatus, a heterogeneous combination of processors or processor architectures, and so on. Without limitation, embodiments of the present invention may take the form of web-based computer software, which includes client/server software, software-as-a-service, peer-to-peer software, or the like.

In embodiments, a computer may enable execution of computer program instructions including multiple programs or threads. The multiple programs or threads may be processed approximately simultaneously to enhance utilization of the processor and to facilitate substantially simultaneous functions. By way of implementation, any and all methods, program codes, program instructions, and the like described herein may be implemented in one or more threads, which may in turn spawn other threads, which may themselves have priorities associated with them. In some embodiments, a computer may process these threads based on priority or other order.

Unless explicitly stated or otherwise clear from the context, the verbs “execute” and “process” may be used interchangeably to indicate execute, process, interpret, compile, assemble, link, load, or a combination of the foregoing. Therefore, embodiments that execute or process computer program instructions, computer-executable code, or the like may act upon the instructions or code in any and all of the ways described. Further, the method steps shown are intended to include any suitable method of causing one or more parties or entities to perform the steps. The parties performing a step, or portion of a step, need not be located within a particular geographic location or country boundary. For instance, if an entity located within the United States causes a method step, or portion thereof, to be performed outside of the United States then the method is considered to be performed in the United States by virtue of the causal entity.

While the invention has been disclosed in connection with preferred embodiments shown and described in detail, various modifications and improvements thereon will become apparent to those skilled in the art. Accordingly, the foregoing examples should not limit the spirit and scope of the present invention; rather it should be understood in the broadest sense allowable by law. 

What is claimed is:
 1. A processor-implemented method of saving registers to a stack comprising: providing access to a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; receiving an instruction for execution by an execution unit; determining whether the received instruction is a save instruction, the save instruction identifying a number of registers and a starting register within the plurality of numbered registers; and outputting one or more control signals to cause the execution unit to store to a stack the identified number of registers in a predetermined order beginning with the starting register when determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register.
 2. The method of claim 1 wherein the outputting one or more control signals to cause the execution unit to store to the stack the identified number of registers in a predetermined order beginning with the starting register further comprises outputting, for each iteration of a number of iterations, one or more control signals to cause the execution unit to store one register at an address of the stack, wherein the number of iterations is equal to the identified number of registers.
 3. The method of claim 2 further comprising tracking a number of registers that have been stored using a counter.
 4. The method of claim 3 further comprising identifying the register to be stored in the stack in a particular iteration based on the starting register and the counter.
 5. The method of claim 4 wherein identifying the register to be stored in the stack in a particular iteration comprises identifying a number of the register to be stored in the stack in a particular iteration as: a sum of a number of the starting register and the counter when the sum is less than the highest preserved register number plus one; and a sum of a lowest preserved register number plus a remainder of the sum of the number of the starting register and the counter divided by a highest preserved register number plus one, otherwise.
 6. The method of claim 3 further comprising identifying the address of the stack for a particular iteration based on the counter.
 7. The method of claim 6 wherein the address of the stack for a particular iteration is based on a stack pointer and an offset based on the counter.
 8. The method of claim 7 wherein the offset is a sum of the counter and one multiplied by a size of an entry in the stack.
 9. The method of claim 1 wherein the save instruction further indicates whether or not an out of sequence register is to be saved, and the method further comprises replacing a final register in the predetermined order with the out of sequence register when it is determined that the received save instruction indicates that an out of sequence register is to be saved.
 10. The method of claim 9 wherein the out of sequence register is a global pointer register.
 11. The method of claim 1 wherein the starting register is not one of the plurality of preserved registers.
 12. The method of claim 1 wherein the starting register is one of the plurality of preserved registers.
 13. The method of claim 1 wherein the plurality of preserved registers comprises one or more of: a saved register, a frame pointer register, and a return address register.
 14. The method of claim 1 further comprising: determining whether the received instruction is a restore instruction, the restore instruction identifying a number of registers and a starting register; and outputting one or more control signals to cause the execution unit to restore from a stack the number of registers in a predetermined order beginning with the starting register when it is determined that the received instruction is a restore instruction, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register.
 15. A processor-implemented method of restoring registers from a stack comprising: providing access to a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; receiving an instruction for execution by an execution unit; determining whether the received instruction is a restore instruction, the restore instruction identifying a number of registers and a starting register within the plurality of numbered registers; and outputting one or more control signals to cause the execution unit to restore from a stack the identified number of registers in a predetermined order beginning with the starting register when determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register.
 16. The method of claim 15 wherein the outputting one or more control signals to cause the execution unit to restore from the stack the identified number of registers in a predetermined order beginning with the starting register further comprises outputting, for each iteration of a number of iterations, one or more control signals to cause the execution unit to store one register at an address of the stack, wherein the number of iterations is equal to the identified number of registers.
 17. The method of claim 16 further comprising tracking a number of registers that have been stored using a counter.
 18. The method of claim 17 further comprising identifying the register to be restored from the stack in a particular iteration based on the starting register and the counter.
 19. The method of claim 18, wherein identifying the register to be restored from the stack in a particular iteration comprises identifying a number of the register to be restored from the stack in a particular iteration as: a sum of a number of the starting register and the counter when the sum is less than the highest preserved register number plus one; and a sum of a lowest preserved register number plus a remainder of the sum of the number of the starting register and the counter divided by a highest preserved register number plus one, otherwise.
 20. The method of claim 17 further comprising identifying the address of the stack for a particular iteration based on the counter. 21-22. (canceled)
 23. The method of claim 15 wherein the restore instruction further indicates whether an out of sequence register is to be restored, and the method further comprises replacing a final register in the predetermined order with the out of sequence register when it is determined that the received restore instruction indicates that an out of sequence register is to be restored. 24-29. (canceled)
 30. A computer system for saving registers to a stack comprising: a memory which stores instructions; one or more processors attached to the memory wherein the one or more processors, when executing the instructions which are stored, are configured to: provide access to a plurality of numbered registers, the plurality of numbered registers comprising a plurality of preserved registers, the plurality of preserved registers comprising two or more non-contiguous blocks of registers; receive an instruction for execution by an execution unit; determine whether the received instruction is a save instruction, the save instruction identifying a number of registers and a starting register within the plurality of numbered registers; and output one or more control signals to cause the execution unit to store to a stack the identified number of registers in a predetermined order beginning with the starting register when determination is positive, wherein, in the predetermined order, each register is followed by a next highest numbered register except a highest numbered preserved register, which is followed by a lowest numbered preserved register. 31-33. (canceled) 